Skip to content

Conversation

@dzherb
Copy link

@dzherb dzherb commented May 15, 2025

Thanks to @JelleZijlstra for pointing out a possible fix! I added an additional check, but the initial draft was nearly complete.

Comment on lines +770 to +771
if not cls_module:
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting that we're paranoid about the dataclass not having its module imported on only this branch and not the branch above. Not suggesting a change, but a comment could be useful if you know why we only care in the else branch and not the if not module_name branch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the checks that were already present in the function. In the else branch, there was a check for if module:

module = sys.modules.get(cls.__module__)
if module and module.__dict__.get(module_name) is a_module:
      ns = sys.modules.get(a_type.__module__).__dict__

I’d actually consider removing that check — it does seem unnecessary. At least, I can’t think of a case where cls.__module__ would be missing from sys.modules. And the tests are still passing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw the original code. My suggestion would be to hoist module = sys.modules.get(cls.__module__) out of the if..else. If you want to preserve the rather paranoid sanity check alongside the hoisted assignment, you could. Something like:

module = sys.modules.get(cls.__module__)
if module:  # not sure that we need to be this paranoid 
    ns = module.__dict__
if module_name:
...

if (
isinstance(a_type_module, types.ModuleType)
# Handle cases when a_type is not defined in
# the referenced module, e.g. 'dataclasses.ClassVar[int]'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitVar lives in dataclasses, ClassVar lives in typing. Using an example of dataclasses.ClassVar is likely to cause confusion.

return False

ns = None
module_name = match.group(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an edge case that is not handled by this: foo.typing_filtered.ClassVar. The regex only accounts for one module level. While it is probably uncommon to import foo.typing_filtered without aliasing it, it is possible.

I don't want to change the world for a simple bug fix, but it seems like partitioning on . is more robust (and probably more performant) than using the current regex.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that splitting on a dot would be faster, there are existing tests that cover cases like dataclasses.InitVar.[int] and dataclasses.InitVar+. If we want to support multiple module levels, we should extend the current regex pattern rather than replace it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth creating a new issue to track more levels of module nesting. And I think it's a mistake to test for dataclasses.InitVar.[int] and other invalid strings.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug was actually caught by __.typx.ClassVar in the real world. I typically stuff all of my common imports, including typing_extensions as typx, into an internals subpackage (__) so that I can do from . import __. Reduces module namespace pollution and lets me avoid setting __all__ to define module interfaces. But, I understand that my practice may not be common.

That said, any fix to multi-level traversal here or another PR is going to affect the code that is actually being fixed. Imo, it would make more sense to fix both together holistically (to save developer effort), if there is any interest in actually fixing the multi-level traversal.

Comment on lines +774 to +780
if (
isinstance(a_type_module, types.ModuleType)
# Handle cases when a_type is not defined in
# the referenced module, e.g. 'dataclasses.ClassVar[int]'
and a_type_module.__dict__.get(type_name) is a_type
):
ns = sys.modules.get(a_type.__module__).__dict__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be replaced with:

return (
    isinstance(a_type_module, types.ModuleType)
    and is_type_predicate(a_type_module.__dict__.get(type_name), a_module))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can indeed replace it. But if we go with this version, the a_type parameter will no longer be used in the function body.

Also, I was thinking — in the original version, this line:

sys.modules.get(a_type.__module__)

could probably be replaced with a_module? Or am I missing something? It seems like a_type might not have been necessary from the start, and we should probably remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think the code can be simplified, please do so, then I'll review that version.

@@ -0,0 +1 @@
Fix bug where ``ClassVar`` string annotation in :func:`@dataclass <dataclasses.dataclass>` caused incorrect __init__ generation
Copy link
Member

@ericvsmith ericvsmith May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just starting to look at this PR, and will have more to say later. For now: I'd prefer this to say something more specific, like <some specific description> of ClassVar not recognized as typing.ClassVar. I'd like this because I'm sure we won't be fixing all instances of using ClassVar as string annotations, and we should mention the exact one being fixed here. Also, I don't want to mention __init__, because that's not the only method affected by this bug.

@bedevere-app
Copy link

bedevere-app bot commented May 19, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants